Back

Microbial Genomics

Microbiology Society

Preprints posted in the last 7 days, ranked by how well they match Microbial Genomics's content profile, based on 204 papers previously published here. The average preprint has a 0.11% match score for this journal, so anything above that is already an above-average fit.

1
Flex-It: A global standardised genotyping framework for Shigella flexneri

Hawkey, J.; Nodari, C. S.; Iqbal, Z.; Hunt, M.; Wick, R. R.; Chong, C. E.; Jenkins, C.; Howden, B. P.; Holt, K.; Weill, F.-X.; Baker, K. S.; Ingle, D. J.

2026-04-20 microbiology 10.64898/2026.04.17.719127 medRxiv
Top 0.1%
18.0%
Show abstract

Shigella flexneri is the leading causative agent of shigellosis globally. The public health threat posed by S. flexneri is compounded by its emergence as a sexually transmissible infection, importance of international travel in driving dissemination, and the increasing prevalence of antimicrobial resistance (AMR). A rapid and robust computational method is needed to enhance genomic surveillance and systematically explore features of the population structure of this WHO priority pathogen, which is scalable and readily implementable across jurisdictions, particularly as vaccine development efforts are underway. Here, we present Flex-It, a genomic framework and genotyping scheme implemented in Mykrobe for S. flexneri serotypes 1-5, X & Y, compatible with previous approaches used to describe S. flexneris population structure. To develop Flex-It, we curated a retrospective dataset of 5,819 publicly available S. flexneri genomes. We characterised the global population structure for S. flexneri, exploring geographical and temporal traits, and showed the granular diversity of AMR and serotype profiles. We applied Flex-It to >13,000 genomes routinely generated by public health laboratories from Australia, the UK and the USA across a ten-year period. We found significant genotype diversity in all three locations, with the emergence of genotypes with converged resistance to all major drugs currently used for treatment. Flex-It provides an open-source, novel genotyping method that rapidly characterises S. flexneri and its ciprofloxacin resistance determinants in <1 minute from both short and long whole-genome sequencing reads. Flex-It provides the community with a standardised nomenclature to monitor the emergence and spread of S. flexneri lineages.

2
Retrospective analysis of clinical and environmental genotyping reveals persistence of Pseudomonas aeruginosa in the water system of a large tertiary children's hospital in England

Sheth, E.; Case, L.; Shaw, F.; Dwyer, N.; Poland, J.; Wan, Y.; Larru, B.

2026-04-24 infectious diseases 10.64898/2026.04.23.26351604 medRxiv
Top 0.2%
10.0%
Show abstract

Background Pseudomonas aeruginosa is a major cause of healthcare-associated infections in paediatric settings, where its persistence in moist environments such as hospital water and wastewater systems poses a particular risk to neonates and immunocompromised children. Aim The aim of this study was to showcase the long-term survival and transmission of P. aeruginosa in a large tertiary children's hospital in England which is crucial to develop strategies for water-safe care. Methods Environmental P. aeruginosa isolates were collected from taps, sinks, showers, and baths in augmented care areas of a 330-bed tertiary children's hospital built to NHS water-safety standards. Clinical isolates were classified as invasive (blood, cerebrospinal fluid, and bronchoalveolar lavage) or non-invasive (respiratory, urine, ear, abdominal, and rectal surveillance). Variable number tandem repeat (VNTR) profiles and metadata were extracted from PDF reports, de-identified, deduplicated, and curated using Python and R. Findings This retrospective study analysed nine-locus VNTR profiles of 457 P. aeruginosa isolates submitted to the UK Health Security Agency from a large tertiary children's hospital, identifying 56 isolate clusters (each with [&ge;]2 isolates), of which 19 (34%) contained at least one invasive isolate. The most persistent cluster (Cluster 1, n=20) spanned from July 2016 to September 2024, containing environmental and clinical (invasive and non-invasive) isolates. Conclusion These findings demonstrate long-term persistence of certain genotypes and temporal overlap between environmental and clinical isolates, highlighting the difficulty in detecting and eradicating P. aeruginosa in hospital water and wastewater systems and reinforcing the need for continuous rigorous water system controls.

3
Systematic evaluation of 24 extraction and library preparation combinations for metagenomic sequencing of SARS-CoV-2 in saliva

Qian, K.; Abhyankar, V.; Keo, D.; Zarceno, P.; Toy, T.; Eskin, E.; Arboleda, V. A.

2026-04-20 genomics 10.64898/2026.04.16.719115 medRxiv
Top 0.5%
4.4%
Show abstract

Sequencing the respiratory tract transcriptome has the potential to provide insights into infectious pathogens and the hosts immune response. While DNA-based sequencing is more standard in clinical laboratories due to its stability, RNA assays offer unique advantages. RNA reflects dynamic physiological changes, and for RNA viruses, viral RNA particles directly represent copies of the viral genome, enabling greater diagnostic sensitivity. However, RNAs susceptibility to degradation remains a significant challenge, particularly in RNase-rich specimens like saliva. To address this, we conducted a systematic, combinatorial evaluation of 24 distinct mNGS workflows, crossing eight nucleic acid extraction methods with three RNA-Seq library preparation protocols. Remnant saliva samples (n = 6) were pooled and spiked with MS2 phage as a control. The SARS-CoV-2 virus was spiked into half of the samples, which were extracted using the eight different extraction methods (n = 3) and compared using RNA Integrity Number equivalent (RINe) scores and RNA concentration. The extracted RNA was then processed across the three library construction methods and subjected to short-read sequencing to assess all 24 combinations head-to-head. We compared methods based on viral read recovery and found that RINe and concentration did not correlate with viral detection. The Zymo Quick-RNA Magbead kit and the Tecan Revelo RNA-Seq High-Sensitivity RNA library kit were the extraction and library-preparation kits that yielded the most SARS-CoV-2 reads, respectively. Importantly, our combinatorial analysis revealed that any small variability attributable to different nucleic acid extraction methods was heavily overshadowed by differences in quality attributable to the RNA-Seq library preparation methods. These findings challenge the reliance on conventional RNA quality metrics for clinical metagenomics and underscore the need to redefine extraction quality standards for mNGS applications. IMPORTANCEmNGS is a powerful and unbiased approach towards pathogen detection that has mostly been applied to blood and cerebrospinal fluid samples. However mNGS has recently been applied to more areas including the respiratory pathogen detection space, with potential applications in both in-patient diagnostics and public health surveillance. Saliva samples are an ideal sample type for these use cases since they can be collected non-invasively. However, saliva is also a challenging sample type due to its high RNase activity and often yields low-quality nucleic acid. This study explores the feasibility of using saliva specimens in mNGS with contrived SARS-CoV-2 samples to optimize the combination of two factors: nucleic acid extraction and RNA-seq library preparation. Exploration in this area could enhance the sensitivity of saliva-based mNGS assays, with the goal of future expansion of this specimen type in clinical diagnostics and public health surveillance. Key PointsO_LIThe choice of RNA-Seq library preparation kit has a greater impact on pathogen detection than the nucleic acid extraction method. C_LIO_LIThe combination of Zymo Quick-RNA Magbead extraction kit and TECAN Revelo RNA-Seq High Sensitivity RNA library kit recovered the highest percentage of total SARS-CoV-2 reads. C_LIO_LIRNA quantity and RINe score do not correlate with viral read capture, indicating a need for an alternative metric to assess RNA quality for downstream mNGS clinical diagnostics. C_LI

4
Integrated Resistome and Quantitative Proteomics Reveal Coordinated Resistance Architecture in MDR and XDR Gram-Negative ICU Pathogens

Lima, A. A.; Silva, D.; Sherman, N. E.; Nogueira, L.; Clementino, M. A.; Havt, A.; Quirino Filho, J.; Sousa, F.; Lima, I. F. N.; Costa, D. D. S.; Ribeiro, S.; Mesquita, F.; Sousa, J.; Lino, L.; Alves, A.; Damasceno, A.; Carneiro, L.; Gondim, R.; Fragoso, L. V.; Rodrigues, J. L.; Miyajima, F.; Carvalho, B.; Maia, M. S.; Arruda, E. A. G. d.

2026-04-20 microbiology 10.64898/2026.04.15.718841 medRxiv
Top 0.6%
3.6%
Show abstract

ObjectivesAntimicrobial resistance (AMR) in Gram-negative pathogens is driven by complex and coordinated molecular mechanisms that remain incompletely characterized. This study integrated phenotypic, genomic, and quantitative proteomic analyses to characterize multidrug-resistant (MDR) and extensively drug-resistant (XDR) Gram-negative bacteria circulating in an intensive care unit (ICU) in Northeastern Brazil. MethodsA total of 259 Gram-negative isolates collected between 2019 and 2021 underwent species identification, antimicrobial susceptibility testing, and targeted qPCR for resistance genes. Klebsiella pneumoniae, Acinetobacter baumannii, and Pseudomonas aeruginosa representing susceptible, MDR, and XDR phenotypes were selected for whole-genome sequencing and label-free quantitative proteomics. Differential protein abundance was assessed using Limma with |log2FC| > 1 and p < 0.05. ResultsK. pneumoniae (47%), A. baumannii (24%), and P. aeruginosa (21%) predominated. Carbapenem resistance reached 44%, 93%, and 61%, respectively, and MDR/XDR phenotypes occurred in >30% of isolates. Genomic analyses revealed dense resistomes with coexisting {beta}-lactamases (blaKPC, blaNDM, blaCTX-M, OXA) and widespread efflux systems. Proteomic profiling demonstrated phenotype-associated differences in outer membrane proteins, transport systems, regulatory proteins, and metabolic pathways. XDR isolates showed additional enrichment of envelope remodeling proteins, stress response mechanisms, and proteostasis-associated factors. ConclusionsMDR and XDR Gram-negative ICU pathogens exhibit coordinated resistance architecture characterized by accumulation of resistance genes and adaptive proteomic remodeling. Integrated multi-omics approaches provide mechanistic insight into antimicrobial resistance and support improved surveillance and therapeutic strategies. What is known?O_LIAntimicrobial resistance is a priority and a serious problem in global health, resulting in high rates of morbidity and mortality. C_LIO_LIKlebsiella pneumoniae, Acinetobacter baumannii, and Pseudomonas aeruginosa are on the World Health Organizations (WHO) priority list as major causes of morbidity and mortality worldwide. C_LIO_LIClassical characterization of susceptibility and resistance phenotypes does not capture the complexity of antimicrobial resistance and hampers effective control measures and actions to minimize the evolutionary dynamics of resistance in these bacteria. C_LI What is new?O_LIThe study characterizes the phenotypic pattern of antimicrobial susceptibility, the presence and sequencing of the resistome and virulome, and analyzes the label-free quantitative proteome of susceptible, MDR, and XDR phenotypes in strains of K. pneumoniae, A. baumannii, and P. aeruginosa circulating in hospital ICUs in Brazil. C_LIO_LIMDR and XDR gram-negative phenotypes are associated with a dense resistome, with widespread dissemination of beta-lactamase genes (bla_KPC, bla_NDM, bla_CTX-M, and OXA) and RND-type (MEXs) and acrAB-tolC efflux pumps, without changes in virulence genes. C_LIO_LIProteomic analysis demonstrated increased production of beta-lactamases, components of efflux pump systems, outer membrane protein synthesis, protection for oxidative stress mechanisms, proteins for iron acquisition, and systemic regulators. XDR strains additionally showed enhanced remodeling of the cell envelope, activation of proteostasis, and metabolic adaptation. C_LI

5
Oral and plasma microbiome in the context of acute febrile illness

Sy, M.; Ndiaye, T.; Thakur, R.; Gaye, A.; Levine, Z. C.; Ngom, B.; Bellavia, K. L.; Firer, D.; Toure, M.; Ndiaye, I. M.; Diedhiou, Y.; Mbaye, A. M.; Gomis, J. F.; DeRuff, K. C.; Deme, A. B.; Ndiaye, M.; Badiane, A. S.; Paye, M. F.; Sabeti, P. C.; Ndiaye, D.; Siddle, K. J.

2026-04-20 infectious diseases 10.64898/2026.04.16.26351042 medRxiv
Top 0.7%
3.5%
Show abstract

Emerging infectious diseases and antimicrobial resistance (AMR) have surfaced as two major public health threats over the past two decades. Consequently, integrative surveillance systems capable of detecting both emerging pathogens and resistance-carrying bacteria are crucial. With advances in next-generation sequencing, simultaneous detection of pathogens and AMR is increasingly feasible. In this study, we used short-read metatranscriptomics complemented by total 16S rRNA metagenomic long-read sequencing to analyze paired oral and plasma samples from a cohort of febrile individuals at two locations in Senegal. Oral microbiomes differed in community composition between locations, and reduced diversity and richness were significantly associated with high fever. We identified at least one known pathogen in 15.33 % (23/150) of samples, with Borrelia crocidurae as the most frequently detected pathogen. We detected both pathogenic and non-pathogenic viruses in oral (10/72) and plasma (09/78) samples. Finally, we observed a high frequency of genes associated with resistance and virulence: 10% of samples expressed at least one AMR gene (ARG), and 24% expressed virulence factor genes. Resistance to widely used beta-lactam antibiotics was the most prevalent. Our findings provide critical data on oral and plasma microbiomes in the context of acute febrile illness in Senegal while expanding understanding of circulating ARGs.

6
Antimicrobial Resistance Profiling and Phenotypic Characterization of Archived Clinical Bacillus paranthracis Strains

Michel, P. A.; Maxson, T.; Chivukula, V.; Overholt, W.; Medina Cordoba, L. K.; Ayodele-Abiola, S.; McQuiston, J.; Beesley, C. A.; Bell, M.; Figueroa, V. C.; Bugrysheva, J.; Chandross-Cohen, T.; Weiner, Z.; Carroll, L. M.; Kovac, J.; Sue, D.

2026-04-19 microbiology 10.64898/2026.04.16.719033 medRxiv
Top 0.8%
2.5%
Show abstract

Bacillus paranthracis was formally defined as a species in 2017, after decades of carrying the name "emetic B. cereus" based on cereulide production and clustering within B. cereus sensu lato phylogenetic group III. Commonly associated with foodborne intoxication, reports rarely link B. paranthracis to non-foodborne clinical illness. As such, the new taxonomy and close resemblance of the name to the biothreat pathogen Bacillus anthracis cause confusion in diagnostic and public health settings. To address this issue, B. paranthracis clinical strains (n=20) from the CDC collection were tested with microbiological methods used for identification of B. anthracis and antimicrobial susceptibility testing. Some B. paranthracis phenotypes were similar to B. anthracis, however others were inconsistent across strains. Like B. anthracis: 3 strains tested capsule positive, 5 were non-hemolytic on blood agar, and 9 non-motile. All B. paranthracis strains were resistant to gamma phage lysis, which differentiated them from B. anthracis. Treatment regimens for B. paranthracis infections are not well established, as antimicrobial therapy is not indicated for emetic intoxication caused by B. paranthracis. Notably, six B. paranthracis strains had elevated minimal inhibitory concentrations to anthrax-recommended antibiotics: one for ciprofloxacin, three for doxycycline and tetracycline, and two for clindamycin. Rapid MinION sequencing was assessed for antimicrobial resistance detection prediction but had limited value when using PiMA v.1. These microbiological observations and susceptibility profiles of B. paranthracis expand our understanding of this pathogen, strengthening our ability to differentiate this bacterium from B. anthracis to improve diagnosis and patient outcomes. IMPORTANCEThis study describes in vitro characterization of 20 archived clinical strains of B. paranthracis, an opportunistic pathogen identified more frequently in recent reports. Our findings highlight phenotypic differences and similarities between B. paranthracis and B. anthracis using standard microbiological methods and drug susceptibility profiling. We also assess a rapid B. anthracis specific MinION long read genome sequencing workflow with B. paranthracis. This report highlights the overlapping morphological features shared by B. paranthracis and B. anthracis to improve future laboratory diagnosis and strengthen anthrax preparedness. This article will effectively reach an audience of public health professionals and microbiologists strengthening anthrax preparedness.

7
Genomic diversification underlies the broad ecological range of Salmonella enterica serotype Typhimurium

Ohri, L.; Chinnareddy, S.; Goh, Y.-X.; Zhang, H.; Deng, X.; Pruden, A.; Cheng, R.; Li, S.; Liao, J.

2026-04-21 genomics 10.64898/2026.04.16.719076 medRxiv
Top 0.9%
2.0%
Show abstract

Salmonella Typhimurium is a versatile foodborne pathogen with a broad ecological range, making it an ideal model to better understand pathogen adaptations that allow them to infect multiple hosts and persist across diverse environments. We analyzed 745 genomes of S. Typhimurium isolated from three food animal sources (bovine, swine, and poultry) and two non-food animal sources (wild birds and the environment). We found that S. Typhimurium from food animal sources generally had a more open pangenome and harbored more antimicrobial resistance genes (ARGs) than non-food animal sources. Notably, swine isolates exhibited the most open pangenome and prevalent ARGs, likely as a result of horizontal gene transfer primarily mediated by plasmids. Despite similar core genome sizes, S. Typhimurium from different sources displayed distinct patterns of positive selection in the core genome that varied in both frequency and targeted functional categories. In contrast, although accessory genome sizes varied substantially across sources, the frequency of positive selection remained similar. Using machine learning, we further identified genetic variants (e.g., virulence factors) highly predictive of sources. These findings suggest that gain and loss of accessory genes and positive selection acting on core genes facilitate differential adaptation in S. Typhimurium, contributing to its broad ecological range.

8
Analysis of a detoxified Escherichia coli strain for bacteriophage production

Welham, E.; Park de la Torriente, A.; Arng Lee, J.; Keith, M.; McAteer, S. P.; Paterson, G. K.; Gally, D. L.; Low, A. S.

2026-04-21 microbiology 10.64898/2026.04.21.719556 medRxiv
Top 1%
1.3%
Show abstract

Phage therapeutics are re-emerging as adjuncts or alternatives to antibiotics and their clinical translation will be enhanced with production methods that minimise downstream processing. We evaluated whether an endotoxin-reduced E. coli strain developed for production of recombinant proteins, ClearColi(R), can serve as a useful, safe phage production host without compromising yield and whether targeted receptor complementation can expand its utility. The parent strain BL21(DE3), and its lipid A modified derivative, ClearColi(R), were compared with respect to infection and generation of phage. Across a panel of 31 phage, a similar host range was observed between BL21(DE3) and ClearColi(R). To expand host range ompC was genetically engineered into the chromosome of ClearColi(R), thereby adding OmpC-dependent phage to its production capacity. Production metrics were broadly comparable between the hosts; efficiency of plating and final titres for representative phage were not significantly different; burst size varied by phage but without consistent host bias. Endotoxin activity in ClearColi(R)-propagated lysates was reduced by over 1000-fold relative to BL21(DE3), reaching the low hundreds of endotoxin units (EU) versus hundreds of thousands for BL21(DE3). Intravesical administration of ClearColi(R)-derived phage (LUC4) into pigs elicited no clinical abnormalities and no significant increases in circulating cytokines up to 48 hours after administration. ClearColi(R) allows efficient production of diverse phage with low endotoxin, reducing the requirement for downstream processing. Although its minimal LPS reduces its capacity for producing some LPS-dependent phage and its growth is slower than BL21(DE3), requiring optimisation for maximal phage titre, the safety and simplified manufacturing process support further development of endotoxin modified strains for phage production. Impact statementAntibiotic resistance is a current global problem and treatments based on phage and phage products already have a proven track record with particular bacterial infections, especially in the urinary tract. While progress is being made on in vitro phage synthesis, large scale bacteriophage preparations require a bacterial host for production, consequently toxic components in the initial lysate need to be removed or significantly diluted for safe clinical use. This is a study of the potential to utilise an endotoxin-reduced E. coli strain, ClearColi(R), to produce safer phage therapeutics. Such endotoxin modified strains should minimise the processing steps required and reduce overall production costs of a phage preparation. The research demonstrates that the endotoxin-reduced strain was able to produce a wide range of phage and for studied examples at phage titres equivalent to the more toxic parent strain. We also show that the strain can be modified to increase its host range and confirm the very low endotoxicity of basic phage lysates produced by the strain. Replicating this process to engineer additional low-toxicity bacterial production strains will accelerate the development of safer, more cost-effective phage therapeutics.

9
Culturomics unveils species and expands bacterial and fungal diversity in Inuit oropharyngeal microbiota

Flahaut, M.; Leprohon, P.; Pham, n.-p.; Gingras, H.; Bourbeau, J.; Papadopoulou, B.; Maltais, F.; Ouellette, M.

2026-04-20 microbiology 10.64898/2026.04.20.719640 medRxiv
Top 1%
1.2%
Show abstract

Recent advances in high-throughput sequencing and novel culture techniques have revolutionized our understanding of the human microbiota. However, most studies primarily focused on bacterial communities, often overlooking the fungal component. Building upon our previous metagenomic analysis of the Inuit oropharyngeal microbiome 1, this study used culturomics to provide a more comprehensive view of both bacterial and fungal communities. We analyzed oropharyngeal swabs from the Qanuilirpitaa? 2017 Inuit Health Survey 2, demonstrating the complementarity of metagenomic and culturomic approaches. Our findings highlight the importance of culturomics in revealing low-abundance microorganisms, particularly fungi, which are often underrepresented in metagenomics data. Moreover, we designed an approach to isolate previously uncultivated species. We described two Pauljensenia sp., and provided insights into the phylogenetic relationship between Schaalia and Pauljensenia genera. This study underscores the necessity of a holistic approach to microbiome research, combining multiple techniques to fully elucidate microbial diversity in unique populations like the Inuit.

10
MutaPhy: A clade-based framework to detect genotype-phenotype associations on phylogenetic trees

Ngo, A.; Guindon, S.; Pedergnana, V.

2026-04-21 evolutionary biology 10.64898/2026.04.19.719535 medRxiv
Top 1%
1.2%
Show abstract

Understanding how genetic variation in pathogens influences clinical phenotypes observed in infected hosts is a fundamental challenge in evolutionary genomics and public health. Phenotypic traits such as infection severity are often non-randomly distributed within the pathogens phylogeny, suggesting the existence of evolutionary determinants but also violating the independence assumption underlying classical genome-wide association studies and potentially leading to inflated false positive rates. We present MutaPhy, a phylogeny-based method aimed at detecting correlations between a binary host phenotype and the corresponding pathogen genome by directly utilizing the hierarchical structure of phylogenetic trees. MutaPhy encompasses three different scales: (i) a subtree scale, on which relevant clades over-representing the phenotype of interest are detected using permutation-based tests; (ii) a tree scale, which agglomerates local signals into a global association statistics; and (iii) a site scale, whereby candidate mutational events on branches leading to significant clades are examined using ancestral sequence reconstruction. We evaluate the statistical behavior and detection performance of MutaPhy using simulations under diverse evolutionary scenarios. We also compare this tool to several existing phylogenetic association methods. As illustrative applications, we apply MutaPhy to dengue virus and hepatitis C virus datasets associated to clinical phenotypes in human hosts. Our results highlight the ability of the proposed approach to detect viral lineages associated to over-represented phenotypes while revealing limited evidence for robust mutation-level associations in these particular datasets. Altogether, MutaPhy provides a framework for guiding genotype-phenotype association analyses by leveraging phylogenetic structure, thereby reducing false positive findings and improving the interpretability of association signals.

11
Dietary emulsifiers and host inflammation synergistically drive genomic evolution of Crohn disease-associated E. coli toward enhanced pathogenicity

Rytter, H.; Chevarin, C.; Martin, L.; Bruder, E.; Denizot, J.; Tenaillon, O.; Espeli, o.; Birer, A.; Viennois, E.; Barnich, N.; Chassaing, B.

2026-04-20 microbiology 10.64898/2026.04.20.719593 medRxiv
Top 2%
1.0%
Show abstract

Background and AimsThe rising incidence of Crohns disease (CD) in Westernized countries has been linked to changes in diet and increased consumption of food additives, yet the mechanisms by which these factors fuel intestinal inflammation remain unclear. Adherent-invasive Escherichia coli (AIEC), a pathobiont involved in CD pathogenesis, lacks a clear genetic hallmark but exhibits intestinal colonization and virulence traits, raising questions about the evolutionary forces promoting its emergence among select individuals. Here, we investigated how chronic exposure to two common dietary emulsifiers, carboxymethylcellulose (CMC) and polysorbate 80 (P80), along with host inflammation, drives AIEC genomic evolution and pathogenic potential. MethodsWild-type and Il10-deficient mice were monocolonized with AIEC and chronically exposed to CMC, P80, or water. Bacterial isolates were collected and analyzed for genomic diversification, mutations, and phenotype both in vitro and in vivo. ResultsEmulsifiers accelerated AIEC genomic diversification and selected for mutations linked to increased motility, invasion, and pro-inflammatory activity. Moreover, dietary emulsifier-evolved strains displayed a marked fitness advantage in vivo, outcompeting their counterparts in murine hosts, with the greatest advantage observed when evolution occurred under inflammatory conditions. Notably, evolutionary pathways and phenotypic outcomes were shaped by both emulsifier and the hosts inflammatory status, highlighting synergy between diet and host genetics in fostering pro-inflammatory pathobionts. ConclusionThese findings provide an evolutionary framework connecting modern dietary habits to the emergence of pathogenic AIEC strains, and underscore the importance of dietary interventions in individuals at risk for inflammatory bowel disease.

12
A Pilot Study on the Urinary Microbiome Composition and Diversity in Clinical UTI Samples: A 16S rRNA Analysis

Almamoori, A. A.; Farhan, M. H.; Al-Khafaji, N.; Al_Rahhal, A.

2026-04-19 microbiology 10.64898/2026.04.18.719336 medRxiv
Top 2%
1.0%
Show abstract

This pilot study assessed the composition and diversity of the urinary microbiome from clinically confirmed UTI samples using 16S rRNA sequencing, whilst also exploring inter-individual variability of microbial community structure. We examined ten urine samples from patients with culture-positive UTIs. Demographic and clinical metadata, including age, sex, body mass index (BMI), diabetes status and recent antibiotic exposure was recorded per sample. Metagenomic DNA was extracted from microbial samples and sequenced to generate genus-level taxonomic profiling through 16S rRNA gene sequencing. Relative abundance tables were generated for each of the samples to identify dominant bacterial genera within each sample and summarize cohort level microbial patterns. To evaluate within-sample richness and evenness, alpha diversity indices (Shannon, Simpson, observed features and Chao1) were computed; beta diversity was measured using Bray-Curtis dissimilarity with principal coordinates analysis (PCoA) for graphical representation. The studys findings revealed the sex and moderate clinical diversity of the study sample; all samples were confirmed as having been taken from a UTI patient and exhibited a wide level of heterogeneity regarding the microbial composition of each urine sample. Overall, Pseudomonas was the dominant genus present, however, specific samples had approximately 50% of their microbiomes composed of Klebsiella, Proteus, and Escherichia species as well as approximately 25% of their total microbes were made up of Burkholderia spp., which are closely related to the genus of interest used during the course of this study. The observed alpha diversity of each sample displayed considerable variation for the included samples with a continuum of samples ranging from a single dominant microbe to a highly diverse mixed population producing a highly diverse polymicrobial population/bacterial composition, with some ratios of individual taxa to collective taxa of many samples repeatedly illustrating the exact nature of the specimen. Furthermore, a significant degree of Beta diversity was found between the patients, providing compelling evidence of identifiable differences among urinary microbiomes between patients with UTI. This pilot project provides a clear indication of the diversity and overall heterogeneity of urinary microbiota found in the UTI patients studied. In addition, the results of this study support the notion that the ecological complexities present within a urinary microbiome cannot necessarily be established through conventional culture methods, and that combined with molecular techniques such as 16S rRNA sequencing of bacterial DNA could be used to quantify and characterize the ecologic condition of urinary microbiota separate from the traditional high prevalence of identifiable uropathogens.

13
Genetic diversity and antimicrobial susceptibility pattern of Shiga toxin-producing Escherichia coli and Campylobacter spp. isolated from healthy goats in southern Thailand

Wiriyaprom, R.; Ngasaman, R.; Kaewnoi, D.; Prachantasena, S.

2026-04-20 microbiology 10.64898/2026.04.18.719346 medRxiv
Top 2%
0.9%
Show abstract

Foodborne illness is a significant public health concern worldwide. Shiga toxin-producing Escherichia coli and Campylobacter species are recognized as important zoonotic bacterial pathogens contributing to human infections through the food chain, particularly via foods of animal origin. Although goat meat is in high demand in the southern region of Thailand, studies on foodborne pathogens in this livestock species remain limited. The current study aimed to (i) determine the antimicrobial susceptibility of Campylobacter spp. and STEC isolated from goats, and (ii) analyze the genetic relationships among Campylobacter spp. And E. coli O157 isolates obtained from different sources. Campylobacter jejuni and C. coli isolates were characterized based on sequences of seven housekeeping genes using the Achtman multilocus sequence typing scheme. For E. coli O157:H7, core genome multilocus sequence typing analysis was performed using whole-genome sequencing data. Genetic diversity was observed among C. jejuni, whereas a clonal population structure was detected in C. coli and E. coli O157:H7. Overlapping genetic characteristics were observed between C. jejuni isolates from goats and those previously reported in livestock and humans in Thailand. Among Campylobacter species, resistance to fluoroquinolones, including ciprofloxacin and nalidixic acid, was observed, whereas resistance to fosfomycin was most frequently detected in Shiga toxin-producing E. coli. Tetracycline-resistant isolates were identified in both Campylobacter species and Shiga toxin-producing E. coli at moderate levels. A multidrug-resistant pattern was observed only in C. coli, whereas no multidrug-resistant C. jejuni or Shiga toxin-producing E. coli isolates were detected. These findings indicate that healthy goats may serve as potential reservoirs of zoonotic pathogens and antimicrobial resistance in southern Thailand, where goat meat is frequently consumed.

14
DNAharvester: A Nextflow Pipeline for Analysing Highly Degraded DNA from Ancient and Historical Specimens

Sharif, B.; Kutschera, V. E.; Oskolkov, N.; Guinet, B.; Lord, E.; Chacon-Duque, J. C.; Oppenheimer, J.; van der Valk, T.; Diez-del-Molino, D.; D. Heintzman, P.; Dalen, L.

2026-04-21 bioinformatics 10.64898/2026.04.20.719564 medRxiv
Top 2%
0.9%
Show abstract

Ancient DNA (aDNA) research has advanced rapidly with the development of high-throughput sequencing, which now enables genome-wide analyses of large collections of prehistoric specimens. However, analysing palaeontological and archaeological material with highly degraded DNA constitutes a major bioinformatic challenge. DNA from such samples is characterised by short fragment lengths, low endogenous content, post-mortem damage, and considerable cross-species contamination, which can increase spurious mapping and reference bias, affecting downstream population genetic inferences. Here we present DNAharvester, a modular and reproducible pipeline designed specifically for the processing of highly degraded DNA from ancient and historical specimens. DNAharvester integrates metagenomic filtering before mapping, competitive mapping, adaptive aligner selection (incorporating algorithms such as BWA-aln, BWA-mem, and Bowtie2), and systematic evaluation of reference bias and spurious mapping. By incorporating flexible mapping and filtering strategies, the pipeline can be adapted to varying sample preservation, with a distinct focus on maximising authentic data recovery from highly degraded material. Furthermore, DNAharvester features comprehensive subworkflows for iterative assembly of mitogenomes, identification of genomic repeats and CpG sites, taxonomic classification, microbial/pathogen screening of unmapped reads, genetic sex determination, and variant calling for downstream analyses. To accommodate datasets with varying sequencing depths, the pipeline incorporates multiple variant calling strategies, including diploid variant calling, genotype likelihood estimation, and pseudo-haploid random allele calling. Implemented in Nextflow, DNAharvester provides a highly scalable, containerised framework that enhances reproducibility, portability, and robustness in aDNA analyses. We validated the pipeline across a gradient of simulated scenarios and empirical datasets, demonstrating its ability to systematically mitigate complex background contamination while preserving authentic genomic signals even in the most challenging of circumstances. By streamlining complex bioinformatic tasks through simple configuration files, DNAharvester establishes a standardised approach for the rigorous analysis of highly degraded DNA datasets and makes genomic analyses of ancient remains accessible to the broader research community.

15
OpusTaxa: A Unified Workflow for Taxonomic Profiling, Assembly, and Functional Analysis of Shotgun Metagenomes

Chen, Y.-K.; Harker, C. M.; Pham, C. M.; Grundy, L.; Wardill, H. R.; Roach, M. J.; Ryan, F. J.

2026-04-19 bioinformatics 10.64898/2026.04.15.718825 medRxiv
Top 2%
0.9%
Show abstract

Shotgun metagenomics has become a cornerstone of microbiome research, yet the complexity of existing workflows remains a major barrier for life scientists without dedicated bioinformatics support. Manual database setup, detailed sample sheet preparation, and management of software dependencies can make routine analysis difficult and time-consuming. Cross-study comparisons are further hampered by inconsistent processing pipelines, database versions, and profiling strategies, limiting reproducibility and the potential for large-scale meta-analyses. We present OpusTaxa, an open-source Snakemake workflow that provides end-to-end processing of short paired-end shotgun metagenomic data with minimal configuration. Users provide either FASTQ files or Sequence Read Archive accessions; OpusTaxa automatically downloads required databases, performs quality control, removes host reads, and executes taxonomic profiling, metagenome assembly, and functional analysis. All analysis modules can be independently toggled, and per-sample outputs are automatically merged into harmonised, cross-sample tables ready for downstream exploration. Across two public datasets, we demonstrate how OpusTaxa can be used to compare consistency across complementary taxonomic profilers and to estimate microbial load in addition to standard metagenomic workflows. AvailabilityOpusTaxa is freely available at https://github.com/yenkaiC/OpusTaxa. Documentation, test data, and example configurations are included in the repository.

16
Tuberculosis in households with infectious cases in Kampala city: Harnessing health data science for new insights on an ancient disease with persistent, unresolved problems (DS-IAFRICA TB) study protocol

Nassinghe, E.; Musinguzi, D.; Takuwa, M.; Kamulegeya, R.; Nabatanzi, R.; Namiiro, S.; Mwikirize, C.; Katumba, A.; Kivunike, F. N.; Ssengooba, W.; Nakatumba-Nabende, J.; Kateete, D. P.

2026-04-25 infectious diseases 10.64898/2026.04.23.26351571 medRxiv
Top 2%
0.9%
Show abstract

Tuberculosis (TB) is prevalent in Uganda and overlaps with a high rate of HIV/TB coinfection. While nearly all hospital-based TB cases in Kampala, the capital of Uganda, show clear TB symptoms, 30% or more of undiagnosed TB cases found through active screening are asymptomatic. Additionally, the host risk factors for TB in Kampala cannot be distinguished from environmental risk factors. These TB-specific challenges are just part of the complexity, especially in areas with high HIV/AIDS burden. Data science techniques, especially Artificial Intelligence (AI) and Machine Learning (ML) algorithms, could help untangle this complexity by identifying factors related to the host, pathogen, and environment, which are difficult to explain or predict with traditional/conventional methods. In this project, we will use health data science approaches (AI/ML) to identify factors driving TB transmission within households and reasons for anti-TB treatment failure. We will utilize the computational resources at Makerere University and available demographic, clinical, and laboratory data from TB patients and their contacts to develop AI and ML algorithms. These will aim to: (1) identify patients at baseline (month 0) unlikely to convert their sputum or culture results by months 2 and 5, thus at risk of failing TB treatment; (2) identify household contacts of TB cases who are at risk of developing TB disease, as well as contacts who may resist TB infection despite repeated exposure to M. tuberculosis. Achieving these objectives will provide evidence that data science methods are effective for early detection of potential TB cases and high-risk patients, thereby helping to reduce TB transmission in the community. The study protocol received approval from the School of Biomedical Sciences IRB, protocol number SBS-2023-495.

17
E3 ubiquitin ligase HUWE1 mediates K6-linked polyubiquitylation and stabilization of Nrf2 in an HBx-dependent manner, thereby inhibit ing hepatitis B virus replication

Solichin, M. R.; Deng, L.; Felisha, H.; Krisnugraha, Y. P.; Matsui, C.; Abe, T.; Ryo, A.; Watashi, K.; Muramatsu, M.; Shoji, I.

2026-04-20 microbiology 10.64898/2026.04.20.719611 medRxiv
Top 2%
0.8%
Show abstract

We previously reported that the oxidative stress sensor Kelch-like ECH-associated protein 1 (Keap1) recognizes hepatitis B virus (HBV) X protein (HBx) to activate the NF-E2-related factor 2 (Nrf2) signaling pathway, thereby inhibiting HBV replication, and that HBx promotes K6-linked polyubiquitylation of Nrf2. However, the molecular mechanism remains unclear. Here, we investigated the role of HECT, UBA, and WWE domain-containing E3 ubiquitin ligase 1 (HUWE1) in HBx-mediated K6-linked polyubiquitylation of Nrf2 and its impact on HBV replication. Cell-based ubiquitylation assays demonstrated that HUWE1 knockdown reduced HBx-mediated K6-linked polyubiquitylation of Nrf2, while overexpression of wild-type HUWE1, but not the catalytically inactive HUWE1(C4341A) mutant, enhanced it, indicating that HUWE1 E3 ligase activity is required. Coimmunoprecipitation and proximity ligation assays demonstrated that HUWE1 interacts with HBx in the cytoplasm and binds Nrf2 only in the presence of HBx, suggesting that HBx bridges HUWE1 and Nrf2 into a ternary complex. Cycloheximide chase assays demonstrated that HUWE1 knockdown destabilized Nrf2 in HBx-expressing cells, supporting a role for HUWE1 in Nrf2 stabilization via K6-linked polyubiquitylation. Furthermore, HUWE1 knockdown or treatment with the HUWE1 inhibitor BI8626 significantly increased HBV RNA and pgRNA levels in HBV-infected cells. Collectively, these results demonstrate that HUWE1 promotes K6-linked polyubiquitylation and stabilization of Nrf2 in an HBx-dependent manner to inhibit HBV replication. IMPORTANCEHepatitis B virus (HBV) chronically infects approximately 254 million people worldwide, yet host mechanisms that restrict viral replication remain incompletely understood. The Kelch-like ECH-associated protein 1 (Keap1)/ NF-E2-related factor 2 (Nrf2) signaling pathway is a central defense against oxidative stress. Under basal conditions, Nrf2 is degraded via Keap1/Cullin3-mediated K48-linked polyubiquitylation. We previously demonstrated HBV infection promotes Nrf2 stability through non-canonical K6-linked polyubiquitylation. Here, we identify the E3 ubiquitin ligase HUWE1 as the enzyme responsible for K6-linked polyubiquitylation of Nrf2. HBV X protein (HBx) recruits HUWE1 to Nrf2, forming a HUWE1/HBx/Nrf2 complex that switches Nrf2 ubiquitylation from K48 to K6, stabilizing Nrf2 and suppressing HBV replication. These findings reveal a novel antiviral mechanism exploiting a non-canonical ubiquitin code and highlight HUWE1 as a potential therapeutic target against chronic HBV infection.

18
intI1 predicts ARGs and human source tracking markers carried by coprophagous flies in Maputo, Mozambique

Heintzman, A. A.; Cumbe, Z. A.; Cumbane, V.; Cumming, O.; Holcomb, D.; Keenum, I.; Knee, J.; Monteiro, V.; Nala, R.; Brown, J.; Capone, D.

2026-04-21 occupational and environmental health 10.64898/2026.04.19.26351253 medRxiv
Top 2%
0.8%
Show abstract

Wastewater surveillance is increasingly used for antimicrobial resistance (AMR) monitoring in urban environments, but low-resource settings often lack a piped sewerage system. Instead, coprophagous flies--flies that ingest feces--may serve as composite samplers for monitoring fecal wastes present in terrestrial environments. We evaluated whether the class 1 integron-integrase gene intI1 was associated with genetic markers of AMR and fecal source tracking markers (FST) in coprophagous flies collected from latrine entrances and food preparation areas in low-income urban Maputo, Mozambique. We quantified intI1, an enteric 16S rRNA target (for normalization), three FST markers, and 30 ARG targets using qPCR. We normalized concentrations of intI1 and each target to enteric 16S rRNA. We fit linear mixed models with a random intercept for housing compound to estimate within-fly associations between log10 relative abundance of intI1 and log10 relative abundance of each target with and without adjustment for fly taxonomic group, capture location, and standardized fly mass. We also modeled per-fly unique ARG count (i.e., number of ARG targets detected) using Poisson regression. Of 188 flies assayed, 176 passed internal controls; intI1 and enteric 16S rRNA were detected in 95% and 96% of flies, respectively. Higher relative abundance of intI1 was positively associated with ARG and FST targets, with the strongest associations observed for sulfonamide-(sul1: {beta} = 0.87; 95% CI: 0.81, 0.94; sul2: {beta} = 0.81; 95% CI: 0.73, 0.89), tetracycline- (tetA: {beta} = 0.78; 95% CI: 0.70, 0.85; tetB: {beta} = 0.69; 95% CI: 0.60, 0.79), and trimethoprim-related (dfrA17: {beta} = 0.78; 95% CI: 0.70, 0.86) genes. Associations with FST markers were weaker (i.e., human mtDNA: {beta} = 0.46; 95% CI: 0.37, 0.55; human-associated Bacteroides: {beta} = 0.34; 95% CI: 0.25, 0.43). Higher relative abundance of intI1 was also associated with a greater number of ARGs detected: each 10-fold increase in intI1 was associated with an 8% higher expected unique ARG count (aRR=1.08, 95% CI: 1.04-1.12). These findings support the need for further research across different settings exploring intI1 carried by coprophagous flies as a potential standardized screening target for AMR surveillance in unsewered terrestrial environments.

19
Network-Based Functional Fragility Reveals System-Level Reorganization Of The Gut Microbiome In Inflammatory Bowel Disease

Kenavdekar, M. V.; Natarajan, E.

2026-04-21 bioinformatics 10.64898/2026.04.16.719113 medRxiv
Top 2%
0.7%
Show abstract

The human gut microbiome plays a critical role in host health, yet its functional organization in disease remains poorly understood. Most studies focus on taxonomic composition or pathway abundance, which fail to capture higher-order interactions governing system-level behavior. Here, we investigated microbiome functional organization in inflammatory bowel disease (IBD), including Crohns disease (CD), ulcerative colitis (UC), and healthy controls (HC), using a network-based framework across 60 metagenomic samples. Functional pathway profiles were used to construct correlation-based interaction networks, followed by analysis of network topology, functional redundancy, keystone pathway architecture, and system robustness. Disease-associated networks (CD and UC) exhibited reduced global connectivity, increased modular fragmentation, and centralization of keystone pathways, indicating a shift from distributed organization to more fragmented and fragile network structures compared to healthy controls. Notably, machine learning models demonstrated that network-derived features achieved higher classification performance (accuracy up to 0.824) compared to redundancy-based measures. These findings reveal that microbiome dysfunction in IBD is driven by large-scale reorganization of functional interaction networks rather than loss of functional capacity. This study highlights the importance of network-level analysis in understanding microbiome-associated disease and provides a systems-level framework for future research.

20
REPLAY: A reproducible and user-friendly application for DNA replication timing analysis from Repli-seq data

Dickinson, Q.; Yu, C.; Rivera-Mulia, J. C.

2026-04-21 genomics 10.64898/2026.04.16.719037 medRxiv
Top 2%
0.7%
Show abstract

BackgroundDNA replication timing (RT) is a fundamental feature of genome organization that is regulated in a cell-type-specific manner and frequently altered in disease. Repli-seq is the standard approach for genome-wide RT profiling; however, its analysis typically requires multiple independent tools and custom scripts, limiting reproducibility, portability, and accessibility, particularly for users without computational expertise. In addition, existing workflows often lack standardization and require substantial user intervention. ResultsWe developed REPLAY, a fully automated, reproducible, and user-friendly application for replication timing analysis. REPLAY is distributed as a standalone executable that enables end-to-end processing from compressed FASTQ files to genome-wide RT profiles without requiring software installation or programming experience. Through an intuitive graphical interface, users can configure analysis parameters, including input and output directories, reference genome, normalization strategy (quantile, median, or interquartile range), and smoothing. The application integrates all processing steps--quality control, trimming, alignment, binning, RT log2 calculation, normalization, smoothing, and visualization-- within a single automated workflow. Application of REPLAY to publicly available datasets demonstrate accurate reconstruction of RT profiles and high reproducibility across samples. ConclusionsREPLAY offers a portable, reproducible, and accessible solution for the analysis of RT data. By eliminating the need for command-line tools and complex installations, it lowers the entry barrier enabling standardized analysis across diverse research settings.